11th May 2021
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.1 ✓ dplyr 1.0.5
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## here() starts at /cloud/project
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## term = col_character(),
## estimate = col_double(),
## std.error = col_double(),
## statistic = col_double(),
## p.value = col_double()
## )
Outline
Introduction
- Presentation of data
- Data wrangling
Materials and Methods
- Data visualization
- Logistic regression
- Principal Component Analysis
- K-means clustering
Results and discussion
Introduction
Introduction
Introduction
- Byar & Greene prostate cancer data, from Andrews DF and Herzberg AM (1985)
- Compare four different treatments
- 502 observations of 18 variables
- 27 NA values
Introduction
Variables in the data set
Presentation of the variables with a R output, table, or image?
Data wrangling
<<<<<<< HEAD
Raw data -> Clean data
- Exclude dtime, sdate and sg
- Renaming
Clean data -> Augment data
- Add five new variables: outcome, treatment_mg, EKG_lvl, performance_lvl, age_group
=======
Raw data -> Clean data
- Exclude dtime, sdate and sg
- Renaming
Clean data -> Augment data
- Add five new variables: outcome, treatment_mg, EKG_lvl, performance_lvl, age_group
>>>>>>> 89e7989b5531ce9cb181cab60b0094242f90b6ba
Materials and Methods
Data visualization
Pre-treatment variables - Numeric
<<<<<<< HEAD

Data visualization
Pre-treatment variables - Categorical

Data visualization
Pre-treatment variables - Heatmap

Data visualization
Treatment, outcome and age

=======

Data visualization
Pre-treatment variables - Categorical

Data visualization
Pre-treatment variables - Heatmap

Data visualization
Treatment, outcome and age

>>>>>>> 89e7989b5531ce9cb181cab60b0094242f90b6ba
Logistic regression
Logistic regression
Model outcome as function of treatment
<<<<<<< HEAD
=======
Output:
log_mod_treatment
## # A tibble: 4 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1.11 0.211 5.27 0.000000136
## 2 treatment_mg0.2 0.0907 0.301 0.301 0.763
## 3 treatment_mg1 -0.807 0.280 -2.88 0.00394
## 4 treatment_mg5 -0.0536 0.294 -0.182 0.855
Logistic regression for each variable
Treatment 1.0 mg

Logistic regression
Effects of significant variables for treatment 1.0 mg

Logistic regression
Distribution of significant variables for each outcome

>>>>>>> 89e7989b5531ce9cb181cab60b0094242f90b6ba
Output:
<<<<<<< HEAD
log_mod_treatment
## # A tibble: 4 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1.11 0.211 5.27 0.000000136
## 2 treatment_mg0.2 0.0907 0.301 0.301 0.763
## 3 treatment_mg1 -0.807 0.280 -2.88 0.00394
## 4 treatment_mg5 -0.0536 0.294 -0.182 0.855
Logistic regression for each variable
Treatment 1.0 mg

Logistic regression
Effects of significant variables for treatment 1.0 mg

Logistic regression
Distribution of significant variables for each outcome

Principal Component Analysis
Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

K-means clustering
Results and discussion
=======
Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

K-means clustering

K-means clustering

Results and discussion
>>>>>>> 89e7989b5531ce9cb181cab60b0094242f90b6ba
- Stage 3 and 4 patients differ in tumor size and acid phosphatase levels
- Most effective treatment is 1.0 mg estrogen
- Significant variables are tumor size, CVD, age, and weight index